Online Expansion of Largescale Data Warehouses
نویسندگان
چکیده
Modern data warehouses store exceedingly large amounts of data, generally considered the crown jewels of an enterprise. The amount of data maintained in such data warehouses increases significantly over time—often at a continuous pace, e.g., by gathering additional data or retaining data for longer periods to derive additional business value, but occasionally also precipitously, e.g., when consolidating disparate data warehouses and Data Marts into a single database. Having to expand a data warehouse with 100’s of TB of data by a substantial portion, e.g., 100% or more is a complex and disruptive maintenance operation as it typically involves some sort of dumping and reloading of data which requires substantial downtime. In this paper we describe the methodology and mechanisms we developed in Greenplum Database to expand largescale data warehouses in an online fashion, i.e., without noticeable downtime. At the core of our approach is a set of robust and transactionally consistent primitives that enable efficient data movement. Special emphasis was put on usability and control that lets an administrator tailor the expansion process to specific operational characteristics via priorities and schedules. We present a number of experiments to quantify the impact of an on-going expansion on query workloads.
منابع مشابه
Issues for On-Line Analytical Mining of Data Warehouses
Data warehouses and OLAP engines are expected to be widely available in the near future. The data in data warehouses has been cleansed, integrated, and preprocessed, and infrastructures have been built surrounding data warehouses for e cient data analysis. Therefore, data warehouses or OLAP databases are expected to be a major platform for data mining in the future. We discuss the issues relate...
متن کاملMeta Cube-X: An XML Metadata Foundation for Interoperability Search among Web Data Warehouses
OLAP (Online Analysis Processing) applications have very special requirements to the underlying multidimensional data that differs significantly from other areas of application (e.g. the existence of highly structured dimensions). In addition, providing access and search among multiple, heterogeneous, distributed and autonomous data warehouses, especially web warehouses, has become one of the l...
متن کاملOnline Data Mining
INTRODUCTION Currently, most data warehouses are being used for summarizationbased, multi-dimensional, online analytical processing (OLAP). However, given the recent developments in data warehouse and online analytical processing technology, together with the rapid progress in data mining research, industry analysts anticipate that organizations will soon be using their data warehouses for soph...
متن کاملASM Ground Model and Refinement for Data Warehouses
Data Warehouses and on-line analytical processing (OLAP) systems are a promising area for the application of Abstract State Machines (ASMs). In this paper a ground model specification for data warehouses is sketched that is based on the fundamental idea of separating input from operational databases and output to OLAP systems. On this basis we start defining formal refinement rules for such sys...
متن کاملData Warehousing Applications: an Analytical Tool for Decision Support System
Data-driven decision support systems, such as data warehouses can serve the requirement of extraction of information from more than one subject area. Data warehouses standardize the data across the organization so as to have a single view of information. Data warehouses (DW) can provide the information required by the decision makers. The data warehouse supports an on-line analytical processing...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 4 شماره
صفحات -
تاریخ انتشار 2011